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Cost of dynamical quark simulations: 0(a) improved Wilson fermions* 

H. Wittig a (for the UKQCD, QCDSF and ALPHA Collaborations) 

a Division of Theoretical Physics, Department of Mathematical Sciences, University of Liverpool, UK 

I report on cost estimates and algorithmic performance in simulations using 2 flavours of non-perturbatively 
O(a) improved Wilson quarks together with the Wilson plaquette action. 
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Several collaborations have employed 0(a) 
improved Wilson fermions, following the non- 
perturbative determination of the clover coeffi- 
cient c sw for Nf = 2 flavours in the range f3 > 5.2 
@. UKQCD, QCDSF and JLQCD perform sim- 
ulations on physically large volumes (L < 1.5 fm) 
using periodic boundary conditions , whereas 
ALPHA use Schrodinger functional (SF) bound- 
ary conditions on small volumes (L <C 1 fm) ^,^| . 
In the following I shall discuss the two types of 
simulations separately. 

1. RESULTS FROM UKQCD & QCDSF 

Run parameters for the UKQCD and QCDSF 
simulations are listed in Table I of || and Table 2 
of H, respectively. Additional runs, which arc in- 
cluded here, have since been performed, and their 
details will be published elsewhere. 

Integrated autocorrelation times, r mt , for 
hadron masses, are poorly known. One therefore 
relies on r mt estimated from the average plaque- 
tte, which shows an unexpected slight decrease for 
smaller quark masses (see Table II of Q). Given 
the poor understanding of autocorrelation times, 
UKQCD have chosen a constant separation of 40 
HMC trajectories between "independent" config- 
urations for all data sets. Thus, any scaling of 
r lnt with the quark mass (or, equivalently, amp) 
has not been folded into the cost analysis. 

The number of operations per independent con- 
figuration is modelled according to 



Tflop per new configuration 



ind. cfg. 



C 



1 



amp 



(1) 



The available run data on L/a = 24 provided 
by QCDSF are as yet not sufficient to constrain 
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Figure 1. Fits to the cost formula using UKQCD's 
run data ^ (full circles, solid line). The dotted 
lines represent the uncertainties estimated by in- 
cluding the open circles. 

the L/a dependence well enough, and hence we 
have used L/a = 16 only, setting the coefficient 
z\ equal to the value quoted in ref. Q, i.e. z\ = 
4.55. The performance observed by QCDSF on 
L/a — 24 is consistent with this value. 

Fits to the prefactor C and z 2 in eq. (|I]) were 
performed to the UKQCD subset of the data. Up- 
date times were converted into A ops using the 
CRAY T3E's sustained speed of 275 Mflops per 
processor (32bit and assembler). This yields 



C = 0.31(7) Gflop, z 2 = 2.77(40), 



(2) 
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where the errors have been estimated by fitting 
different subsets of data points. The correspond- 
ing curves are shown in Fig. [l]. The estimate for 
Z2 agrees with the value quoted in Q , whereas the 
prefactor C in eq. (||) is roughly 10 times larger. 
The reason for this is so far unknown. 

In order to estimate the CPU effort required 
to repeat the quenched benchmark for the light 
hadron spectrum H, we assume that the small- 
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est lattice spacing and quark mass each account 
for 50% of the total. Furthermore, using 0(a) 
improvement implies that one can use larger lat- 
tice spacings without compromising the contin- 
uum extrapolations. The following estimates are 
based on 400 configurations on a smallest lattice 
spacing of a = 0.07 fm, with L/a = 48, and a 
minimum pion mass of am™ 1 ", corresponding to 
a dynamical quark mass m q : 
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2. RESULTS FROM ALPHA 

ALPHA simulate massless quarks on small vol- 
umes with SF boundary conditions. The box 
size L is thus the only scale in the problem. In 
particular, the condition number of the fermion 
matrix is determined by L, which implies that the 
role of 1/amp in eq. ([!]) is taken over by L/a. The 
appropriate cost formula for the SF is therefore 

AV/ind. cfg. = C {L/a) z . (3) 

A detailed algorithmic study, including a cost 
analysis, has been published in j5j. ALPHA were 
able to extract precise autocorrelation data for 
the relevant observable, i.e. the running coupling 
in the SF scheme g SF (L). For gg F ps 1 one finds 
that r mt w 2 trajectories with a relative error of 
5-10%. 

ALPHA have used an alternative measure of 
the cost of their simulations. The quantity M CO st 
defined in eq. (3.1) of [[| is expected to differ from 
eq. (||) by an overall factor (L/a) 3 . Fig.[| taken 
from H, shows a plot of Af cost versus L/a for 
9sf(L) ~ L It suggests a scaling of M cost oc 
(L/a) 3 (dashed line in Fig.g), which implies z ~ 6 
ineq.(|). 

The benchmark for ALPHA is the determina- 
tion of the running of a s and the extraction of 
the A-parameter. The results of (|] imply that 
lattice sizes of L/a = 16 — 20 should be suffi- 
cient to determine the step scaling function a for 
Nf = 2 flavours with similar accuracy as in gj. 
The total CPU effort is estimated to be of the or- 
der of 0.1 Tflops years, which is within reach on 
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Figure 2. Cost versus L/a from rcf. ||. 

machines like APE1000. This estimate does not, 
however, include the computation of a low-energy 
scale such as for Nf = 2, which is necessary to 
express A in physical units. 
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